b'                                          UNITED STATES DEPARTMENT OF EDUCATION \n\n                                               OFFICE OF INSPECTOR GENERAL \n\n                                                                    REGION V \n\n                                                            III NORTH CANAL. SUITE 940 \n\n                                                              CHICAGO, ILLINOIS 60606 \n\n\n                                                                   FAX: (312) 353-0244\n     Audit                                                                                                                                    Investigation\n(312) 886-6503                                                                                                                               (312) 353-7891\n\n\n\n                 MEMORANDUM\n\n                 DATE:             June 17,2003\n\n                 TO:               Grover J. Whitehurst\n                                   Director, Institute of Education Sciences\n\n\n\n                 FROM:                         owd\n                                   Regional Inspector General for Audit\n                                   Chicago,IL\n\n                 SUBJECT           FINAL AUDIT REPORT\n                                   Review of Management Controls Over Scoring of the National\n                                   Assessment ofEducational Progress (NAEP) 2000\n                                   Control Number ED-OIG/A05-COOlO\n\n                 Attached is our subject final report that covers the results of our review of management\n                 controls over scoring of the National Assessment of Educational Progress 2000\n                 assessment during October 1, 1999, through September 30, 2000. We received your\n                 comments concurring with the findings and recommendations in our draft audit report.\n\n                 Please provide the Supervisor, Post Audit Group, Office of Chief Financial Officer and\n                 the Office of Inspector General with quarterly status reports on promised corrective\n                 actions until all such actions have been completed or continued follow-up is unnecessary.\n\n                 In accordance with the Freedom of Information Act (5 U.S.C. \xc2\xa7 552), reports issued by\n                 the Office of Inspector General are available, if requested, to members of the press and\n                 general public to the extent information contained therein is not subject to exemptions in\n                 the Act.\n\n                 We appreciate the cooperation given us in the review. Should you have any questions\n                 concerning this report, please call me at 312-886-6503.\n\n                 Attachment\n\n\n\n\n                       Our mission Is to promote the efficiency, effectiveness, and integrity of the Department\'s programs and operations.\n\x0c      Review of Management Controls Over Scoring of the National \n\n                Assessment of Educational Progress 2000 \n\n\n\n\n\n                                   FINAL AUDIT REPORT\n                                            ED-OIG/A05-C0010\n                                                June 2003\n\n\n\nOur mission is to promote the efficiency,                      u.s. Department of Education\neffectiveness, and integrity of the                              Office of Inspector General\nDepartment\'s programs and operations.                                       Chicago, Illinois\n\x0c                                Notice \n\n\n Statements that managerial practices need improvements, as well as other\nconclusions and recommendations in this report represent the opinions of the\nOffice of Inspector General. Determinations of corrective action to be taken\n    will be made by the appropriate Department of Education officials.\n\n In accordance with Freedom of Information Act (5 U.S.C. \xc2\xa7552), reports\n  issued by the Office of Inspector General are available, if requested, to\nmembers of the press and general public to the extent information contained\n              therein is not subject to exemptions in the Act.\n\x0cEXECUTIVE SUMMARY ...................................................................................................... 1 \n\n\nBACKGROUND .................................................... .................................................................. 2 \n\n\nNAEP MANAGEMENT CONTROLS OVER SCORING ARE ADEQUATE .................... 8 \n\n\n          Monitoring .................................................................................................................... 8 \n\n                Recommendations ........................................................................................... 10 \n\n\n          Receipt and Control Process ...................................................................................... 10 \n\n\n          Scoring ........................................................................................................................ 10 \n\n\n          Data Quality ............................................................................................................... 11 \n\n\n          Analysis and Reporting .............................................................................................. 12 \n\n\n          Other Issues ................................................................................................................ 13 \n\n\nOTHER MATTERS .............................................................................................................. 13 \n\n\nOBJECTIVE, SCOPE, AND METHODOLOGY ................................................................ 14 \n\n\nSTATEMENT ON MANAGEMENT CONTROLS ............................................................. 17 \n\n\nATTACHMENTS \n\n\n          Attachment 1 - Additional Management Control Detail Not Presented \n\n                         in the Body of the Report .......................................................... 7 pages \n\n\n          Attachment 2 - Institute of Education Sciences\' Comments on the Draft \n\n                         Report ........................................................................................ 3 pages \n\n\x0cFINAL AUDIT REPORT                                                                    ED-OIG/A05-COOIO \n\n\n\n                                 EXECUTIVE SUMMARY \n\n\nOur audit objectives were to determine whether management controls over scoring of the\nNational Assessment of Educational Progress (NAEP) 2000 assessment were in place and\nadequate to provide reasonable assurance that the assessment results could be relied on during the\nperiod October 1, 1999, through September 30,2000. Based on the work performed, we\ndetermined that the management controls over scoring of the NAEP 2000 assessment were\nadequate and generally working as intended. However, our audit work disclosed two nonmaterial\nweaknesses regarding the monitoring of mathematics qualification sets and scorer qualifications.\nState assessments required under the No Child Left Behind Act could also benefit from standards\nfor management controls over scoring. We plan to report on this separately. The Institute of\nEducation Sciences concurred with our recommendations and its written comments are included\nas Attachment 2 to this report.\n\nTo accomplish our objectives, we (1) obtained background materials and interviewed officials\nfrom the National Center for Education Statistics (NCES), National Assessment Governing Board\n(NAGB), Westat, Educational Testing Service (ETS), and NCS Pearson (NCS) to gain an\nunderstanding of their role in conducting the NAEP 2000 assessment; and (2) gained an\nunderstanding of current Administration and Congressional proposals that could have an affect on\nNAEP such as the Government Performance and Results Act (GPRA), the No Child Left Behind\nAct, the Elementary and Secondary Education Act, and other legislation affecting management\ncontrols. We also gained an understanding of state assessments through interviews with ETS and\nNCS officials. We reviewed and tested management controls over scoring to ensure the\nprocesses were adequate and working as intended. To review management controls over scoring,\nwe interviewed officials to identify the management controls that were in place and reviewed\nvarious documents used in the process. To test the management controls over scoring, we\nexamined the NCS mainframe final data for anomalies, identified the scorers for each subject, and\ninterviewed judgmentally selected scorers. In addition, we reviewed the NCS and ETS computer\nprocessed data to ensure that it was reliable. To determine data reliability, we assessed data\ncompleteness, data authenticity, and the accuracy of computer processing. In addition, we gained\nan understanding of the ETS scoring analysis and reporting process. Further, we reviewed\nselected NCS employee payroll, personnel file, and position description records, and NCS\' NAEP\nprofit margin.\n\n\n\n\nJune 2003                     Review of Management Controls Over                      Page 1 of 18 \n\n                  Scoring of the National Assessment of Educational Progress 2000 \n\n\x0cFINAL AUDIT REPORT                                                                            ED-OIGI A05-COOIO\n\n\n\n\n                                         BACKGROUND\nNATIONAL ASSESSMENT OF EDUCATIONAL PROGRESS\nAs mandated by Congress, NAEP surveys the educational accomplishments ofD.S. students and\nmonitors the changes in those accomplishments. NAEP, often called the "Nation\'s Report Card,"\nis described as the only nationally representative and continuing assessment of what America\'s\nstudents know and can do in various subjects. NAEP provides a comprehensive measure of\nstudents\' learning at critical junctures in their school experience. The assessment has been\nconducted regularly since 1969 and for over 30 years NAEP has been collecting data to provide\neducators and policymakers with accurate and useful information. Because NAEP makes\nobjective information about student performance available to policymakers at national and state\nlevels, it plays an integral role in evaluating the conditions and progress of the nation\'s education.\n\nOver the years, NAEP has evolved to address\nquestions asked by policymakers, and NAEP now\nrefers to a collection of national and state                        NAEP Components\nassessments. The collection of assessments\n                                                                                                                 I\nincludes main NAEP (state and national) and\nlong-term trend NAEP (national).                             I The National Assessment of Educational Progress\n                                                                                   NAEP                          I\n\nThe main assessments report results for grade\n                                                                        I   MAIN     I      ILONG-TERM I\n\nsamples of fourth, eighth, and twelfth grade            I            ST!;EHHljINATI~N~IINATI~NALI\nstudents. They periodically measure students\' \n\nachievement in reading, mathematics, science, \n\nwriting, U.S. history, civics, geography, and \n\nother subjects. In 1997, main NAEP returned to \n\nannual assessments. In 2000, the main NAEP assessed mathematics and science at grades four, \n\neight, and twelve and reading at grade four. \n\n\nThe long-term trend assessments report results for age/grade samples (nine year-olds/fourth \n\ngrade; thirteen year-olds/eighth grade; and seventeen year-olds/eleventh grade). They measure \n\nstudents\' achievement in mathematics, science, reading, and writing. Measuring trends of student \n\nachievement, or change over time, requires the precise replication of past procedures. Therefore, \n\nthe long-term trend instrument does not evolve based on changes in curricula or in educational \n\npractices. In 1999, the long-term trend assessment began to be administered on a four-year \n\nschedule and in different years from the main national and state assessments in mathematics, \n\nscience, reading, and writing. As a result, in 2000, this assessment was not administered. \n\n\nInitiated in 1990, state assessments enable participating states to compare their results with those \n\nof the nation and other participating states. Because the national NAEP samples (main and long\xc2\xad\n\nterm trend) were not designed to support the reporting of accurate and representative state level \n\nresults, Congress authorized state assessments. State assessments have separate representative \n\nsamples of students selected for each jurisdiction that participates, to provide these jurisdictions \n\n\n\n\nJune 2003                       Review of Management Controls Over                            Page 2 of 18\n                  Scoring of the National Assessment of Educational Progress 2000\n\x0cFINAL AUDIT REPORT                                                                     ED-OIG/A05-COOlO\n\n\nwith reliable state level data concerning the achievement oftheir students. The main national and\nstate assessments use the same assessment booklets. The state NAEP assessment is administered\nin every even year. In 2000, the state NAEP assessed mathematics and science in grades four and\neight.\n\nNAEP has two major goals: to reflect current educational and assessment practices and to\nmeasure change reliably over time. To meet these dual goals, NAEP selects nationally\nrepresentative samples of students who participate in either the main assessments or the long-term\ntrend assessments. These two assessments report information for the nation and for specific\ngeographic regions of the country (Northeast, Southeast, Central, and West). These assessments\nuse distinct data collection procedures, separate samples of students, and test instruments based\non different frameworks. The results are also reported separately.\n\nParticipation in NAEP 2000 was voluntary for states, school districts, schools, teachers, and\nstudents. Some state legislatures mandated participation; others left the option to participate to\ntheir superintendents and other educational officials at the local level. Other states chose not to\nparticipate. Before any student selected to participate actually took the test, the student\'s parents\ndecided whether or not their child would do so. Under the No Child Left Behind Act, NAEP\nparticipation is mandatory for all recipients of Title I funds.\n\nNAEP assessments used a combination of multiple-choice and constructed response questions.\nThe multiple-choice questions are electronically scanned and scored. Professional scorers\nevaluate the constructed response questions. The assessments are not designed to provide\nindividual student scores. Each student receives only a small portion of the assessment. The\nassessment sessions last 45 to 90 minutes depending on the subject. The entire assessment\nprocess, from administering the assessments, to analyzing and reporting the results, can take\nanywhere from 9 to 18 months.\n\n\n\n\n June 2003                       Review of Management Controls Over                     Page 3 of 18 \n\n                   Scoring of the National Assessment of Educational Progress 2000 \n\n\x0cFINAL AUDIT REPORT                                                                                       ED-OIG/A05-COO 10\n\n\nSince 1983, the Department of Education (Department) has conducted NAEP through a series of\ncontracts, grants, and cooperative agreements with various entities. The following chart depicts\nthe relationship of these entities for the audit period October 1, 1999, through September 30,\n2000.\n\n\n\n                                NAEP ENTITIES \n\n\n                                             USEd\nNAEPPo/ioy                          U.S. Department of Education\n\n             NAGB\n\n                                      Office of Educational\n                                    Research and Improvement\n\n\n\n\n                                            NCES\n                                        National Center\n                                               for\n                                       Education Statistics\n\n                     NAEP Admln/.frat/on\n\n\n                                            NAEP\n                                       National Assessment\n                                                 of\n                                       Educational Progress\n\n\n\n\n                           Westat                                 ETS\n                         SlImpl/ngllnd                Educational Testing Service\n                         Datil CoIleotion              Development, Scoring,\n                                                       Anlllv.l. and Reporting\n\n\n\n                                       NCS Pearson                                AIR\n                                      National Computer Systems       American Institutes for Research\n                                                Scoring                         Report/ng\n\n\n\n                                                                                                             March 9, 2001   ,\n\n\n\n\nNATIONAL CENTER FOR EDUCATION STATISTICS\nThe Commissioner of Education Statistics, who heads the NCES in the Department, is\nresponsible, by law, to carry out the NAEP project through competitive awards to qualified\norganizations. NeES establishes agreements with private companies for test development and\nadministration services. NCES publishes the results of the NAEP assessments and releases them\nto the media and public. NCES strives to present this information in the most accurate and useful\n\n\n\n\nJune 2003                     Review of Management Controls Over                                         Page 4 of 18 \n\n                 Scoring of the National Assessment of Educational Progress 2000 \n\n\x0cFINAL AUDIT REPORT                                                                    ED-OIG/A05-COOIO\n\n\nmanner possible, publishing reports designed for the general public and specific audiences and\nmaking the data available to researchers for secondary analyses.\n\nNATIONAL ASSESSMENT GOVERNING BOARD\nIn 1988, Congress established the NAGB to formulate policy guidelines for NAEP. The NAGB,\nappointed by the Secretary of Education but independent ofthe Department, governs the\nprogram. It is authorized to set policy for the NAEP. NAGB selects the subject areas to be\nassessed, develops guidelines for reporting, and gives direction to NCES. It is required by law to\napprove all assessment questions and review the scoring guides. NAGB monitors the field-testing\nprocess and may suggest changes in assessment questions.\n\nWESTAT\nNCES has a cooperative agreement with Westat. Under this agreement, Westat selects the school\nand student samples, trains assessment administrators, and manages field operations (including\nassessment administration and data collection activities). For the national assessment, Westat\nadministers the assessments and for the state assessment, the individual states administer the\nassessments. For the state assessments, Westat conducts quality control monitoring ofthe\nassessment administration by either sending staff to schools or calling the state administrators.\n\nEDUCATIONAL TESTING SERVICE\nNCES has an agreement with ETS. Since 1983, NCES has conducted the assessment through a\nseries of contracts, grants, and cooperative agreements with ETS. Under these agreements, ETS\nis responsible for developing the assessment instruments, scoring student responses, analyzing the\ndata, and reporting the results. ETS scores the multiple-choice questions and subcontracts with\nNCS to score the constructed response questions. ETS analyzes the scoring data and summarizes\nthe results. ETS then drafts reports for NCES to review and approve.\n\nNCSPEARSON\nNCS, which serves as a subcontractor to ETS, is responsible for printing and distributing the\nassessment materials and for scanning and scoring constructed response questions. NCS handles\nall receipt control, data preparation and processing, scanning, and scoring activities. NCS\nperforms optical scanning of multiple-choice selections, handwritten responses, and other data.\nThis image based scoring system eliminates paper in the scoring process, which also permits on\xc2\xad\nline monitoring of scoring reliability and creation of recalibration sets.\n\nAMERICAN INSTITUTES FOR RESEARCH\nAmerican Institutes for Research (AIR), which serves as a subcontractor to ETS, is responsible\nfor development of the background questionnaires. Students, teachers, and principals complete\nthese questionnaires to provide NAEP with data about students\' school backgrounds and\neducational activities. Students answer questions about the courses they take, homework, and\nhome factors related to instruction. Teachers answer questions about their professional\nqualifications and teaching activities, while principals answer questions about school level\npractices and policies. Relating student performance on the cognitive portions of the assessments\nto the information gathered on the background questionnaires increases the usefulness ofNAEP\n\n\n\nJune 2003                      Review of Management Controls Over                     Page 5 of 18 \n\n                  Scoring of the National Assessment of Educational Progress 2000 \n\n\x0cFINAL AUDIT REPORT                                                                    ED-OIG/A05-COOI0\n\n\nfindings and provides the context for a better understanding of student achievement. AIR did not\nperform work related to our audit objectives; therefore, it was not included in our review.\n\nNO CHILD LEFT BEHIND ACT\nOn January 8, 2002, President Bush signed into law the No Child Left Behind Act of 200 1. This\nnew law represents his education reform plan and contains the most sweeping changes to the\nElementary and Secondary Education Act since it was enacted in 1965. It changes the federal\ngovernment\'s role in kindergarten through grade 12 education by asking America\'s schools to\ndescribe their success in terms of what each student accomplishes. The act contains the\nPresident\'s four basic education reform principles: stronger accountability for results, increased\nflexibility and local control, expanded options for parents, and an emphasis on teaching methods\nthat have been proven to work.\n\nAn "accountable" education system involves several critical steps:\n\n\xe2\x80\xa2      States create their own standards for what a child should know and learn for all grades.\n       Standards must be developed in math and reading immediately. Standards must also be\n       developed for science by the 2005-06 school year.\n\n\xe2\x80\xa2      With standards in place, states must test every student\'s progress toward those standards\n       by using tests that are aligned with the standards. Beginning in the 2002-03 school year,\n       schools must administer tests in each of three grade spans: grades 3 through 5, grades 6\n       through 9, and grades 10 through 12 in all schools. Beginning in the 2005-06 school year,\n       tests must be administered every year in grades 3 through 8 in math and reading.\n       Beginning in the 2007-08 school year, science achievement must also be tested.\n\n\xe2\x80\xa2      Each state, school district, and school will be expected to make adequate yearly progress\n       toward meeting state standards. This progress will be measured for all students by sorting\n       test results for students who are economically disadvantaged, are from racial or ethnic\n       minority groups, have disabilities, or have limited English proficiency.\n\n\xe2\x80\xa2       School and district performance will be publicly reported in district and state report cards.\n        Individual school results will be on the district report cards.\n\n\xe2\x80\xa2       If the district or school continually fails to make adequate progress toward the standards,\n        then they will be held accountable.\n\nThe No Child Left Behind Act required changes in the NAEP assessment schedule. As a result,\nstate participation in NAEP reading and mathematics biennial assessments in grades four and eight\nis required of states participating in Title I. Previously, state NAEP reading and mathematics was\nperformed on a four-year cycle.\n\n\n\n\nJune 2003                       Review of Management Controls Over                    Page 6 of 18 \n\n                  Scoring of the National Assessment of Educational Progress 2000 \n\n\x0c         FINAL AUDIT REPORT                                                                     ED-OIGI A05-COO10\n\n\n         GOVERNMENT PERFORMANCE AND RESULTS ACT\n         This audit falls under the context of the GPRA, specifically data quality and reliability. To report\n         the NAEP results, data needs to be accurate, complete, and timely, because the Department\'s\n         programs rely on NAEP as a data source. The Department\'s 2000 Performance Report objectives\n         identified Department goals and individual programs that relied on NAEP. Department goals 1\n         and 2 had objectives that relied on NAEP as a data source. There were six individual programs\n         that contained objectives that relied on NAEP as a data source. The individual programs\n         included: (1) Title I Grants for Schools Serving At-Risk Children, (2) Educational Technology\n          State Grants, (3) State Assessments, (4) Indian Education, (5) Grants to States and Preschool\n         Grants Programs - IDEA Part B, and (6) Perkins Vocational and Technology Education. While\n         the Strategic Plan for 2002-2007 has changed significantly, NAEP is still used extensively as a\n         data source.\n\n\n\n\n          June 2003                       Review of Management Controls Over                     Page 7 of 18 \n\n                            Scoring of the National Assessment of Educational Progress 2000 \n\n\n\n\n-~~---   ---------------         ---~--\'--------------------\n\x0cFINAL AUDIT REPORT                                                                  EO-OIG/A05-COOI0\n\n\n\n\n NAEP MANAGEMENT CONTROLS OVER SCORING ARE ADEQUATE\n\nThe management controls over scoring of the NAEP 2000 assessment were adequate and\ngenerally working as intended for the period October 1, 1999, through September 30,2000. Our\naudit work confirmed that the management controls provided reasonable assurance that the\nassessment results could be relied upon. However, our audit work did identifY two nonmaterial\nweaknesses regarding mathematics qualification sets and scorer qualifications. We recommend\nthat the Director of the Institute of Education Sciences (formerly Office of Educational Research\nand Improvement) instruct NCES to (1) improve its monitoring ofETS and NCS for adherence\nto the terms of its cooperative agreements and (2) require NCS to use a qualification set of papers\nfor mathematics and document that the scorers passed a qualification set of papers. We also\nnoted that state assessments required under the No Child Left Behind Act could benefit from\nstandards for management controls over scoring. We plan to report on this separately. This\nreport highlights the management controls. These controls are more comprehensive than\npresented here. For additional details regarding these management controls see Attachment 1.\n\nMonitoring\n\nFor monitoring management controls, we considered NCES\' monitoring of its NAEP Cooperative\nAgreements with ETS and Westat. We also considered ETS\' monitoring of its sub-contract with\nNCS. NCES monitors its NAEP Cooperative Agreements with Westat and ETS through periodic\nmeetings and reports. NCES officials informed us that its NCS monitoring is limited due to travel\nfunds. ETS also monitored NCS through periodic meetings and reports. In addition, ETS\nmonitored the scoring process through on-site assessment experts during the constructed\nresponse scoring at NCS. Our review of monitoring management controls disclosed they were\nadequate except for two nonmaterial weaknesses.\n\nNCBS needs to improve its monitoring to ensure adherence to the terms of the NAEP Technical\nApplication. Our review of monitoring management controls disclosed two nonmaterial\nweaknesses where the terms of the NAEP 2000 Technical Application were not met. These\nweaknesses included mathematics qualification sets and scorer qualifications.\n\nMathematics Qualification Sets\n\nNCS did not use andlor document mathematics qualirying sets for training on extended\nconstructed response questions as required in the NAEP 2000 Technical Application. Extended\nconstructed response questions are defined as questions worth four points or higher. According\nto the NAEP 2000 Technical Application, Chapter 14, page 10, before scoring live responses to\nextended constructed response questions, each scorer must pass a qualification set of papers to\nensure that he or she was able to score with the acceptable level of reliability.\n\nThe audit disclosed an End of Project Report document that indicated, "NAEP Math did not use\nany qualifYing sets for training so everyone that was trained scored. Only two people were\n\n\n\n June 2003                      Review of Management Controls Over                  Page 8 of 18\n                  Scoring of the National Assessment of Educational Progress 2000\n\n\n                                                                                                       --------\n\x0cFINAL AUDIT REPORT \t                                                                  ED-OIG/A05-COOIO\n\n\nreleased due to poor performance." In addition, an NCS employee informed us that ETS made \n\nthe decision that no qualifying sets would be used for mathematics. \n\n\nETS and NCS officials informed us that practice papers, rather than formal qualification sets, \n\nwere used to ensure that scorers were able to score with an acceptable level of reliability. \n\nHowever, the use of practice papers for this purpose was not documented. While the quality of \n\nscoring was high, it may have been higher had NCS met the requirement for each scorer to pass a \n\nqualification set of papers to ensure that he or she was able to score with the acceptable level of \n\nreliability. ETS and NCS officials indicated that in the future only sets explicitly identified as \n\nqualification sets would be used for qualification and that a strict record of qualification \n\nperformance would be kept. \n\n\nScorer Qualifications \n\n\nOur audit work also disclosed that some scorers did not meet scorer qualification requirements. \n\nWe interviewed 14 scorers of which 12 scored at the grade 12 level. Ofthese 12 scorers, 8 did \n\nnot meet the scorer qualification requirements for assessments at the grade 12 level outlined in the \n\nNAEP 2000 Technical Application. According to the NAEP 2000 Technical Application, \n\nChapter 14, pages 6 and 7, scorers had to have the following qualifications: \n\n\n\xe2\x80\xa2\t     a minimum of a bachelor\'s degree in an appropriate academic discipline, such as\n       mathematics, science, English, or education, and\n\xe2\x80\xa2\t     demonstrable ability in performance assessment scoring, with\n\xe2\x80\xa2\t     teaching experience at the elementary or secondary level preferred.\n\nFor assessments at the grade 12 level, special academic experience in the subject being assessed\nwas required. For example, to score the grade 12 science assessment, scorers needed to have\nhigh school science teaching experience, or a university or graduate degree in science or science\neducation.\n\nETS and NCS officials informed us that the available work force at that time could not meet the\nqualification requirements for the grade 12 level. In the Spring 2000 marketplace, individuals\nwith degrees in mathematics, science, and closely related fields, were in high demand and those\ninterested in short-term positions scoring NAEP were difficult to find. The officials also indicated\nthat a formal process for exceptions to the qualification requirements should have been\nimplemented to allow for authorization by NCES. While the quality of scoring was high, it may\nhave been higher had NCS met the qualification requirements for the grade 12 level. Changes to\nthe new NAEP Cooperative Agreement removed the qualification requirements. However,\nNCES could improve its monitoring to ensure adherence to the terms ofthe Agreement.\n\n\n\n\nJune 2003                       Review of Management Controls Over                    Page 9 of 18 \n\n                  Scoring of the National Assessment of Educational Progress 2000 \n\n\x0cFINAL AUDIT REPORT \t                                                                    ED-OIG/A05-C001O\n\n\nRecommendations\n\nWe recommend that the Director of the Institute of Education Sciences (formerly Office of\nEducational Research and Improvement) instruct NCES to:\n\n1.1 \t   Improve its monitoring ofETS and NCS for adherence to the terms of the cooperative\n        agreements.\n\n1.2 \t   Require NCS to use a qualification set of papers for mathematics and document that the\n        scorers\' passed a qualification set of papers.\n\nReceipt and Control Process\n\nOur review of the management controls related to the receipt and control process focused on the\nroles of Westat and NCS in ensuring that all assessment booklets sent to the participating schools\nwere accounted for and returned to NCS for inclusion in its scoring database. 1 The receipt and\ncontrol process used by Westat and NCS provided reasonable assurance that all assessment\nbooklets sent to the selected schools were accounted for and returned to NCS.\n\nScoring\n\nOur review of the scoring management controls considered the roles ofETS and NCS in ensuring\nthat the (1) correct constructed response rubric and multiple-choice answer keys were used, (2)\nscorer qualification requirements were met, (3) scorers were trained, and (4) scorers were\nmonitored for reliability to ensure the scoring of each question was consistent among the scorers\nand over time. NCS was responsible for scoring the constructed response questions and ETS was\nresponsible for scoring the multiple-choice questions. ETS performed quality assurance steps\nbefore the assessments were conducted that are related to scoring. These steps included\nindependent verification of multiple-choice question keys, review of constructed response\nquestions and scoring rubrics, and review of all multiple-choice and constructed response\nquestions by members ofNAEP subject area committees. Before scoring live responses to\nextended constructed response questions, each scorer must pass a qualification set of papers to\nensure that he or she is able to score with the acceptable level of reliability. In addition, ETS and\nNCS selected training materials for constructed response scoring, which included anchor,\npractice, calibration, and qualification papers for each response to be scored and final scoring\nrubrics. 2 NCS used these papers to provide scorer training prior to actual scoring of constructed\nresponse questions.\n\nDuring scoring NCS used four methods to monitor reliability. These methods included\n                                                                    3\ncalibration, backreading, interrater reliability, and trend scoring. We determined that NCS\'\n\n\n1 For additional infonnation on Receipt and Control Process see Attachment 1, page 1.\n2 For additional infonnation on Scoring see Attachment 1, pages 1 and 2.\n3 Ibid.\n\n\n\n\nJune 2003                         Review of Management Controls Over                    Page 10 of 18 \n\n                    Scoring of the National Assessment of Educational Progress 2000 \n\n\x0cFINAL AUDIT REPORT                                                                        ED-OIG/A05-COOIO\n\n\nmonitoring reliability methods provided reasonable assurance of scoring quality and that it met the\nminimum standards for NAEP 2000 regarding interrater reliability.\n\nETS performed on-site monitoring at NCS during constructed response scoring. This included\nmonitoring interrater reliability reports, I-tests, frequency distributions of scores, and the rate of\n        4\nscoring. NCS also used these monitoring tools. The on-site monitoring kept NAEP\nmanagement informed of scoring issues or problems.\n\nData Quality\n\nFor data quality management controls, we considered the roles ofETS and NCS. Our\nexamination was based on interviews and review of documentation. 5 ETS performed quality\nassurance before the assessments were conducted, on-site monitoring at NCS during constructed\nresponse scoring, database quality assurance on the scoring database during scoring and after\nscoring is completed, and quality assurance steps undertaken as part of statistical analysis of data.\nNCS performed quality assurance when scanning the assessment booklets into the database for\nimage scoring, during scoring, and prior to data delivery to ETS. We also examined computer\xc2\xad\nprocessed data for reliability.\n\nThe quality assurance steps performed by ETS before the assessments were conducted related to\npre-field testing the review process, field-testing the assessments, and preparing a thorough\nscoring planning memorandum ensured that meaningful data would be obtained. If multiple\xc2\xad\nchoice questions lack single correct answers, or if constructed response questions do not have\nsolid scoring rubrics, then no scoring or analysis process, no matter how carefully planned and\nexecuted, will yield meaningful data.\n\nETS performed quality assurance steps before the assessments were conducted that were related\nto data quality. These steps were designed to ensure multiple-choice questions have a single\ncorrect answer and constructed response questions have a solid scoring rubric in order to yield\nmeaningful data.\n\nThe on-site monitoring performed by ETS was instrumental in ensuring the quality of the scoring\ndata as constructed response scoring was being performed. The various reports monitored while\non-site would identify problems with data quality before scoring was completed and the scoring\ndata sent to ETS.\n\nETS database quality assurance involved steps taken once the assessment data was sent to ETS.\nMany of these steps were designed to ensure that the data has expected characteristics and meets\nthe basic quality standards before analysis work is completed.\n\nThe NCS data quality assurance steps included scanning, scoring, and data delivery. The NCS\nscanning process provided reasonable assurance that the data entered into the database was\n\n4   Ibid.\n5   For additional information on Data Quality see Attachment 1, pages 2 through 6.\n\n\nJune 2003                          Review of Management Controls Over                     Page 11 of18 \n\n                      Scoring of the National Assessment of Educational Progress 2000 \n\n\x0cFINAL AUDIT REPORT                                                                        ED-OIG/A05-COOlO\n\n\naccurate. The NCS data quality assurance steps for scoring and data delivery ensured that the\ndata was accurate.\n\nFor data quality we examined computer-processed data for reliability. Our testing for data\nreliability, focused on assessing the competency of the data. To determine data reliability we\nassessed data completeness, data authenticity, and the accuracy of computer processing. For\ndetails on our testing see the Objective, Scope, and Methodology section.\n\nAs part of our data completeness work, we tested management controls over scoring by\nexamining the NCS mainframe final data for anomalies, identifYing the scorers for each subject,\nand judgmentally selected scorers to interview. Our examination of the data for anomalies\nconsidered many issues, such as (1) the number of scorers by question, subject, scoring date, and\nscorer identification number; (2) various scoring scenarios; and (3) scorer consistency by question\nand identification number. We identified and reconciled the number ofNAEP 2000 constructed\nresponse and multiple-choice questions, number of scorers, and time period for scoring the\nconstructed response questions to various documents provided by ETS and NCS.\n\nOur examination ofthe NCS data for anomalies disclosed no issues of concern. Our\nreconciliation ofthe above information to various documents provided by ETS and NCS\ndisclosed that they generally reconciled.\n\nThe review of data quality management controls disclosed no concerns regarding its reliability.\nThe quality assurance steps performed by ETS and NCS disclosed no concerns and provided\nreasonable assurance that the data was reliable. Our testing for data reliability regarding data\ncompleteness, data authenticity, and the accuracy of computer processing disclosed no concerns.\nWe compared 100 percent of the scoring data from the NCS mainframe final data to the ETS\nSecondary User data. Our testing confirmed that ETS processed all NAEP 2000 scoring data\nproperly once it received the data from NCS. In addition, we found no anomalies in the NCS data\nthat caused concern. Our testing disclosed that the ETS Secondary User database accurately\nreflected the source records. We determined that the number of assessments received by ETS and\navailable for use in the Nation\'s Report Card generally met the Westat sample requirements.\n\nAnalysis and Reporting\n\nFor analysis and reporting management controls, we considered the role ofETS. Our\n                                                                     6\nexamination was based on interview and review of documentation. Quality assurance steps\nundertaken as part of statistical analysis of data and preparation of reports included three distinct\nsets of quality assurance processes. These included a system offormal procedural and statistical\nchecks on the data analysis process, a thorough series of plausibility checks, and quality assurance\nofNAEP reports. However, the reporting process was outside the scope ofthis audit so we did\nnot perform work in this area.\n\n\n\n6   For additional information on Analysis and Reporting see Attachment 1, page 6.\n\n\nJune 2003                           Review of Management Controls Over                    Page 12 of 18 \n\n                      Scoring of the National Assessment of Educational Progress 2000 \n\n\x0cFINAL AUDIT REPORT                                                                    ED-OIG/A05-COOIO\n\n\nOur review of the quality control steps undertaken as part of statistical analysis of the NAEP 2000\ndata disclosed that the steps were adequate. The procedural and statistical checks on the data\nanalysis process should provide reasonable assurance that any data abnormalities were caught and\nresolved prior to reporting on the data. The quality controls were augmented with computerized\nchecking that should reduce the likelihood of human error in the process. The plausibility checks,\nwhich compare data to expectations, historical precedent, and data obtained through other\nanalysis methods, were designed to make sure the data "makes sense", and thereby further\nincrease the reliability ofthe data. The statistical analysis process used by ETS provides\nreasonable assurance that the data accurately reflects the NAEP 2000 scoring results.\n\nOther Issues\n\nWe considered other issues that might affect the management controls such as incentive payments\nfor scorers and NCS\' NAEP profit margin. To determine whether these issues were of concern\nand whether management controls were working as intended we (1) interviewed 14 NCS scorers;\n(2) reviewed NCS position descriptions for a scoring director, a scoring supervisor, a trainer, and\na scorer; (3) reviewed 3 NCS scorer personnel files; (4) reviewed 4 NCS scorer payroll records;\nand (5) examined NCS\' December 2000 accounting records regarding its NAEP profit margin.\nOur work disclosed no concerns regarding incentive payments for scorers or NCS\' NAEP profit\nmargin.\n\n\n                                      OTHER MATTERS\n\nThe state assessments required as a result of the No Child Left Behind Act might benefit from the\nNAEP management controls. In addition, to the biennial assessments required under NAEP, the\nNo Child Left Behind Act requires schools receiving Title I funds to have annual state\nassessments in mathematics and reading in three grade spans beginning in the 2002-03 school\nyear. Beginning in the 2005-06 school year, assessments must be administered every year in\ngrades three through eight in mathematics and reading. States create their own standards for each\nsubject and must assess every student\'s progress toward those standards. We believe that each\nstate\'s design of this assessment should include some minimum level of management controls over\nscoring for uniformity. The Department should consider whether the types of management\ncontrols over scoring used for NAEP are appropriate for state assessments. We plan to report on\nthis separately.\n\n\n\n\nJune 2003                       Review of Management Controls Over                    Page 13 of 18 \n\n                  Scoring of the National Assessment of Educational Progress 2000 \n\n\x0cFINAL AUDIT REPORT                                                                 ED-OIG/A05-COOlO\n\n\n\n\n                  OBJECTIVE, SCOPE, AND METHODOLOGY \n\n\nOur audit objectives were to determine whether management controls over scoring of the NAEP\n2000 assessment were in place and adequate to provide reasonable assurance that the assessment\nresults can be relied on during the period October 1, 1999, through September 30,2000. To\naccomplish our audit objectives we\n\n1. \t   interviewed officials from NCES, NAGB, Westat, ETS, and NCS to gain an\n       understanding of their role in conducting NAEP;\n2. \t   reviewed and tested management controls over scoring to ensure the processes were\n       working as intended;\n3. \t   reviewed and tested the ETS and NCS computer processed data to ensure that it was\n       reliable;\n4. \t   reviewed background materials related to NCES, NAGB, Westat, ETS, and NCS, such as:\n                a. \t  The NAEP Guide, 1999 Edition\n                b. \t  ETS Standards for Quality and Fairness 2000\n                c. \t  ETS NAEP 2000 Technical Application\n                d. \t  NAEP 1998 Technical Report\n                e.    Special Provisions Cooperative Agreement \n\n                f     ETS Subcontract with NCS Pearson \n\n                g. \t  NCES Handbook of Survey Methods, September 2001\n                h. \t  NCES Statistical Standards, June 1992 and draft May 2002\n                l. \t  No Child Left Behind Act\n               J. \t   Department\'s 2000 Performance Report\n                k. \t  NCES Statistics and Assessment\n                1. \t  Government Performance and Results Act of 1993\n                m. \t  Federal Managers Financial Integrity Act of 1982\n                n. \t  Chief Financial Officers Act of 1990\n                o. \t  Government Management and Reform Act 1994\n5. \t   gained an understanding of current Administration and Congressional proposals that could\n       have an affect on NAEP, such as GPRA, the No Child Left Behind Act, the Elementary\n       and Secondary Education Act, and other legislation effecting management controls;\n6. \t   gained an understanding ofthe ETS scoring analysis and reporting process;\n7. \t   reviewed selected NCS employee payroll, personnel file, and position description records\n       and NCS\' NAEP profit margin; and\n8. \t   gained an understanding of state assessments through interviews with ETS and NCS\n       officials.\n\nTo review management controls over scoring, we interviewed officials to identify the management\ncontrols in place and reviewed various documents used in the process. To test the management\ncontrols, we examined the NCS mainframe final database for anomalies, identified the scorers for\neach subject, and interviewed judgmentally selected scorers. We also examined the NCS data to\ndetermine ifNCS met the minimum interrater reliability standards and second scoring\n\n\nJune 2003                    Review of Management Controls Over                    Page 14 of 18\n                 Scoring of the National Assessment of Educational Progress 2000\n\x0cFINAL AUDIT REPORT                                                                              ED-OIG/A05-COOIO\n\n\nrequirements. To determine whether management controls were working as intended and\nwhether there were other issues of concern, we judgmentally selected 14 scorers to interview.\nWe selected the sample from a universe of 46 reading, 211 mathematics, and 273 science scorers. 7\n\n\n\n\nReliability of Computer-Processed Data\n\nTo accomplish our objectives, we relied on computer-processed data. To determine the reliability\nof that data, we assessed data completeness, data authenticity, and the accuracy of computer\nprocessing. We tested data completeness to confirm that the universe contained all scoring data\nelements relevant to our audit objectives and that the data transfer from NCS to ETS was\naccurate. To test for data completeness, we compared 100 percent of the scoring data from the\nNCS mainframe final data to the ETS Secondary User data. We also compared the number of\nnational and state NAEP sample assessment booklets requested by Westat for each academic area\nand grade level to the number of assessment booklets NCS printed and distributed, and to the\nnumber of assessment booklets received by ETS as assessed. See table below for details.\n\n\n         SESSION                 SAMPLE SIZE                PRINTEDIISSUED                 ASSESSED\n                               NATIONALISTATE                                           NATIONALISTATE\n    Grade 4 Reading            8,00010                      24,000112,000               8,504/0\n    Grade 4                    13,750/112,500               208,0001189,375             14,3961101,764\n    Mathematics\n    Grade 4 Science            15,7501112,500               222,0001192,376             16,749/96,935\n\n    Grade 8                    15,7501112,500               208,0001192,375             16,846/97,509\n    Mathematics\n    Grade 8 Science            15,7501112,500               222,0001192,376             16,837/94,055\n\n    Grade 12                   13,75010                     39,000/20,625                14,130/0\n    Mathematics\n    Grade 12 Science           15,75010                     55,500/23,626                15,879/0\n\n\n\n\ni For our judgmental sample selection, we selected scorers from each academic area (reading, mathematics, and\nscience) and scorer position description (trainer, scoring director, supervisor, scorer). We also selected scorers that\ntended to score a higher number of questions than other scorers and/or scored in more than one academic area.\n\n\nJune 2003                         Review of Management Controls Over                            Page 15 of 18\n                     Scoring of the National Assessment of Educational Progress 2000\n\x0cFINAL AUDIT REPORT                                                                    ED-OIG/A05-COOlO\n\n\nThe results of our testing confirmed that ETS processed all NAEP scoring data properly once it\nreceived the data from NCS and that we are reasonably certain that the data is complete. We\ndetermined that the number of assessments received by ETS and available for use in the Nation\'s\nReport Card met the Westat sample requirements for the national assessments and for states that\nparticipated. The ETS Secondary User database was used for analysis and reporting ofNAEP\nresults.\n\nOur testing of data authenticity determined if the computer data accurately reflected the source\nrecords. To test data authenticity, we randomly selected a sample of35 assessment booklet\nrecords from the ETS Secondary User database and compared various scoring data to the actual\nassessment booklets. We randomly selected 5 assessment booklets from each subject grade level\nin the national and state NAEP 2000. The sample universe, subject, and grade levels included\n8,504 reading - 4th; 116,160 mathematics - 4th; 114,355 mathematics - 8th ; 14,130 mathematics\xc2\xad\n12th; 99,570 science - 4th; 104,928 science - 8th; 15,009 science - 12th. The actual universe for\nscience grades 4, 8, and 12, respectively, were 113,684, 110,892, and 15,879. Our testing\ndisclosed no errors in the scoring data and that the correct constructed response rubrics were used\nwith each question. For multiple-choice questions the scores in the ETS database accurately\nreflected the assessment booklet bubble answer. A bubble answer is the question answer circle\nthat the student must fill in. In addition, the range for bubble answers in the ETS database\naccurately reflected the range for bubble answers in the assessment booklets. For example, the\nassessment booklet question may provide answer selections that ranged from A through D,\ntherefore the ETS database should also provide for selections that ranged from A through D.\nAlso, the correct answer in the ETS database accurately reflected the rubric correct answer key.\nFor constructed response questions we determined that the score point given for the question\nresponse fell within the acceptable rubric range for that question. We did not test to see ifthe\n student was given the correct score for the constructed response questions because scoring is\n subjective and may vary depending on the scorer. In addition, we did not test the cluster type\nquestions because they are a variation of multiple-choice and constructed response questions that\nwould have the same subjective nature as the scoring for the pure constructed response questions.\n Our testing of data authenticity also included tracing the data from the ETS Secondary User\n database back to the NCS data to ensure there were not any extra ETS Secondary User data\nrecords that were not supported by NCS data. Our testing disclosed the exact same number of\nassessment booklet records in the ETS Secondary User database as there were in the NCS data.\n Based on our testing we believe that the ETS database is reliable and accurate.\n\nThe steps aimed at the accuracy of computer processing were designed to verify that all relevant\nrecords were completely processed and that computer processing met the intended objectives. To\nverify that all relevant records were completely processed, we performed a 100 percent test of\ndata elements, as discussed above, and verified the conversion of items from the NCS database to\nthe ETS Secondary User database was performed accurately. ETS converted some NCS data\nelements, such as scoring labels and question names. Our testing disclosed that all relevant\nrecords were completely processed and accurately converted to meet the intended objectives.\n\n\n\n\nJune 2003                      Review of Management Controls Over                     Page 16 of 18 \n\n                  Scoring of the National Assessment of Educational Progress 2000 \n\n\x0cFINAL AUDIT REPORT                                                                 ED-OIG/A05-COOI0\n\n\nOrganizations and Locations\n\nWe conducted our audit at (1) NCES\' offices in Washington, DC, on October 10,2001,\nDecember 18,2001, and November 13, 2002; (2) NAGB\'s offices in Washington, DC, on\nDecember 19, 2001; (3) Westat in the NCES offices in Washington, DC, on December 20,2001;\n(4) ETS\' offices in Princeton, NJ, from February 26,2002, through March 6,2002; (5) NCS\'\noffices in Iowa City, IA, from April 2, 2002, through June 6,2002; and October 22,2002. We\nheld an exit conference with officials from NCES, ETS, and NCS on November 13, 2002. We\nperformed our audit work in accordance with generally accepted government auditing standards\nappropriate to the scope of review described above.\n\n\n                 STATEMENT ON MANAGEMENT CONTROLS\n\nWe have made a study and evaluation ofthe management control structure over scoring of the\nNAEP 2000 assessment for the period October 1, 1999, through September 30,2000. Our study\nand evaluation was conducted in accordance with generally accepted government auditing\nstandards. For the purpose of this report, we assessed and classified the significant management\ncontrol structure into the following categories:\n\n\xe2\x80\xa2      Monitoring\n\xe2\x80\xa2      Receipt and Control Process\n\xe2\x80\xa2      Scoring\n\xe2\x80\xa2      Data Quality\n\xe2\x80\xa2      Analysis and Reporting\n\nThe scoring ofNAEP is a collaborative effort by several entities, which include NCES, Westat,\nETS, and NCS. The above listed categories of significant management control structures are a\ncombined effort ofthese entities. So, the management ofthese entities is responsible for\nestablishing and maintaining the scoring management control structure. In fulfilling this\nresponsibility, estimates and judgments by the entities\' management are required to assess the\nexpected benefits and related costs of control procedures. The objectives of the system are to\nprovide management with reasonable, but not absolute, assurance that assets are safeguarded\nagainst loss from unauthorized use or disposition and that the transactions are executed in\naccordance with management\'s authorization and recorded properly, so as to permit effective and\nefficient operations.\n\nBecause of inherent limitations in any management control structure, errors or irregularities may\noccur and not be detected. Also, projection of any evaluation of the system to future periods is\nsubject to the risk that procedures may become inadequate because of changes in conditions, or\nthat the degree of compliance with the procedures may deteriorate.\n\n\n\n\nJune 2003                     Review of Management Controls Over                   Page 17 of 18\n                 Scoring of the National Assessment of Educational Progress 2000\n\x0cFINAL AUDIT REPORT                                                                     ED-OIG/A05-COOI0\n\n\nIn our opinion, the management control structure over scoring of the NAEP 2000 assessment for\nthe period October 1, 1999, through September 30,2000, taken as a whole, was sufficient to\nmeet the objectives stated above insofar as those objectives pertain to the prevention or detection\nof errors, irregularities or inefficiencies that would be material in relation to the reliability of the\nassessment results.\n\nNonmaterial weaknesses, which in the auditors\' judgment are reportable conditions, are included\nunder the NAEP MANAGEMENT CONTROLS OVER SCORING ARE ADEQUATE section.\n\n\n\n\n June 2003                       Review of Management Controls Over                    Page 18 of 18 \n\n                   Scoring of the National Assessment of Educational Progress 2000 \n\n\x0c        FINAL AUDIT REPORT                     EO/OIGA05-COO10                              ATTACHMENT I\n\n\n\n           ADDITIONAL MANAGEMENT CONTROL DETAIL NOT PRESENTED IN\n                          THE BODY OF THE REPORT\n\n\n        Receipt and Control Process\n\n        Westat used an Administration Schedule as a control document for the assessments. The\n        Administration Schedule is used to select the schools, students, and assessments given for\n        testing. NCS bar coded the assessment booklets, which allowed it to identify which\n        booklets were sent to each school and assigned to which students. The assessment\n        administrators verify they have received all materials from NCS. After the assessment is\n        conducted, the assessment administrator accounts for all assessment booklets and updates\n        the Administration Schedule using the appropriate administration codes. The\n        Administration Schedule and the assessment booklets are returned to NCS for scoring.\n        All booklets are returned to NCS. NCS has a schedule of all assessments. If it does not\n        receive the assessment booklets in a timely manner, it contacts Westat. Westat then uses\n        the FedEx tracking system to locate the booklets. All boxes of assessment booklets\n        received by NCS are scanned using pre-printed shipping labels NCS provided for the\n        return of the assessment materials. NCS opens the boxes and verifies the contents. NCS\n        compares the distribution file to the receipt file in order to determine if all assessment\n        booklets were returned.\n\n        Scoring\n\n        ETS and NCS selected training materials for constructed response scoring, which\n        included anchor, practice, calibration, and qualification papers to provide scorer training\n        prior to actual scoring of constructed response questions. An anchor set of papers is a\n        collection of questions from prior years with the score reported to illustrate the scoring\n        for that question. A practice set of papers is a collection of questions from prior years\n        without the score reported. The scorer will score each question then the trainer reviews\n        and indicates the correct score along with an explanation for the score. A qualification\n        set of papers is a collection of questions from prior years without the score reported,\n        which the scorer will score and the trainer will grade.\n\n        During scoring NCS used four methods to monitor reliability. These methods included\n        calibration, backreading, interrater reliability, and trend scoring. Scorers performed\n        periodic calibration scoring to make sure that similar answers to the same question were\n        scored consistently. To prevent drift, whenever the scorers took a break longer than 15\n        minutes they scored a set of calibration papers to refresh their training and reinforce the\n        scoring criteria. During backreading, scoring supervisors reviewed each scorer\'s work to\n        ensure that the scorer applied the scoring criteria consistently across a large number of\n        responses and over time. NCS officials indicated that scoring supervisors evaluated\n        about 10 percent of each scorer\'s work in progress. NCS also used reliability scoring,\n        often referred to as interrater reliability, to maintain uniformity of scoring and to ensure\n        that scorer agreement rates met minimum standards. For interrater reliability, a second\n\n\n         June 2003                     Review of Management Controls Over                       Page 1 of7\n                          Scoring of the National Assessment of Educational Progress 2000\n\n\n-~----.--~--------------,---------------\n\x0cFINAL AUDIT REPORT                      ED/OIGA05-COO10 \t                          ATTACHMENT 1\n\n\nrater scores a sample of questions and the agreement between the first and second scores\nis compared. Ifthe interrater reliability does not meet minimum standards, that entire\nquestion set is re-scored. An ETS official stated that the minimum standards for NAEP\n2000 were 75 percent for a four point or more question, 80 percent for a three point\nquestion, and 90 percent for a two point question. Six percent of grades four and eight\nmathematics and science constructed responses and 25 percent of grade four reading and\ngrade 12 mathematics and science constructed responses were required to be scored by a\nsecond scorer to obtain statistics on interrater reliability. NCS used trend scoring to\nensure scoring was consistent across years. Trend scoring included steps and checks to\nensure that scoring decisions were consistent with those made in earlier years. For each\ntrend question used in a previous NAEP cycle, a minimum number of responses in the\nbase year were scored along with the NAEP 2000 responses. The scoring system\ncompared the scores assigned in the original cycle with those assigned in NAEP 2000 to\ndetermine comparability of scoring across years. We determined that NCS\' methods for\nmonitoring reliability provided reasonable assurance of scoring quality and that it met the\nminimum standards for NAEP 2000 regarding interrater reliability.\n\nETS performed on-site monitoring at NCS during constructed response scoring. This\nincluded monitoring interrater reliability reports, t-tests, frequency distributions of scores,\nand the rate of scoring. NCS also used these monitoring tools. Interrater reliability\nreports were reviewed daily to provide immediate feedback to the scorers and correct any\nscoring difficulties. During the scoring oftrend questions, a t-test was performed. If the\nt-test was outside the acceptable range of +/- 1.5 of zero, scoring was stopped in order to\ndetermine a plan of action. Generally, the t-test compares the mean score this time with\nthe mean score from a previous time. If the scorer did not pass, the scorer would be\nretrained. For each question, a report could be run that showed the frequency distribution\nof the scores. This report indicated the separate frequencies for first and second scores.\nThe rate of scoring could be monitored using a status tool that displayed the number of\nresponses scored, the number of responses first scored that still needed to be second\nscored, the number of responses remaining to be first scored, and the total number of\nresponses remaining to be scored. This allowed for accurate monitoring ofthe rate of\nscoring and to estimate the time needed for completing the various phases of scoring.\nThe on-site monitoring kept NAEP management informed of scoring issues or problems.\n\nData Quality\n\nETS performed quality assurance steps before the assessments were conducted that are\nrelated to data quality. These included:\n\n\xe2\x80\xa2\t      Pre-field testing the review process that includes independent verification of\n        multiple-choice answer keys; review of constructed response questions and\n        scoring rubrics; and review of all multiple-choice and constructed response\n        questions by members ofNAEP subject area committees and measurement\n        specialists, the Instrument Development Committees, NCES, and NAGB.\n\n\n\n\nJune 2003                       Review of Management Controls Over                      Page 2 of7 \n\n                   Scoring of the National Assessment of Educational Progress 2000 \n\n\x0cFINAL AUDIT REPORT                    ED/OIGA05-COOIO \t                             ATTACHMENT 1\n\n\n\xe2\x80\xa2\t     Field-testing of all assessments prior to selection for operational use which\n       includes administering all potential NAEP assessments to a sample of 500\n       students, evaluating the functioning of constructed response rubrics, and statistical\n       checks to identify problems in keying of multiple-choice assessments.\n\n\xe2\x80\xa2\t     Preparing a scoring planning memorandum that details for NCS the overall\n       structure of the scoring process, ETS statistical and data requirements, and a\n       summary of scoring completion and data delivery dates.\n\nThe ETS database quality assurance involved steps taken once the assessment data was\nsent to ETS. Many of these steps were designed to ensure that the data had expected\ncharacteristics and met the basic quality standards before analysis work was completed.\nThe database quality assurance procedures included:\n\n\xe2\x80\xa2\t     Test runs of the database using preliminary data received from NCS and Westat.\n\n\xe2\x80\xa2\t     Review of (a) sampling weights received from Westat, (b) scoring data sent by\n       NCS, (c) sampled booklets to check accuracy of the optical mark reading system,\n       and (d) special control files to check the accuracy of score assignments made in\n       the NCS image-based constructed response scoring system.\n\n\xe2\x80\xa2\t     Resolution of any database issues or problems.\n\n\xe2\x80\xa2\t     Calculation of final scoring reliability figures for technical reporting.\n\nThe NCS data quality assurance steps included scanning, scoring, and data delivery.\nDuring the scanning process, assessment booklets are batched, scanned, and bar code\nread. An NCS official stated that NCS performed diagnostic tests on the scanning\nmachines prior to each new production run. Each production run also included three\nquality assurance check sheets, which are documents placed in the batch and scanned\nalong with the pages from the assessment booklets. NCS used Optical Mark Readers for\nscanning that also included intelligent character recognition. The scanning machine\nnumbered each page scanned in case a page needed to be located later. During the\nscanning process, infrared was used to capture only needed information, such as students\'\nhandwriting, into the data file. The scanning process had two edit phases that included\nmachine edits and image editing. Machine edits verified that each page of each\nassessment booklet was present and that each field had an appropriate value. The edit\nprogram checked each assessment booklet number, school code, and other data on the\nbooklet cover for valid value ranges. The edit program then checked each block of the\nassessment booklet for validity and continued through each question within the block.\nEach piece of input data was checked to verify that it was of an acceptable type, that\nvalues fell within a specified range, and that it was consistent with other data values.\nEach scanning machine has built in recovery methods. Attached to the scanning machine\nwas a portable computer that reported scanning errors.\n\n\n\n\nJune 2003                      Review of Management Controls Over                        Page 3 of7\n                  Scoring of the National Assessment of Educational Progress 2000\n\x0cFINAL AUDIT REPORT                    ED/OIGA05-COO 10                              ATTACHMENT I\n\n\nNCS used image editing to scan pages that the main scanning process was unable to scan.\nIf a document could not be scanned, the information was entered into the system\nmanually. The image editing department also reviewed suspect errors on-line. A suspect\nerror is an indication that an error may exist. Two individuals separately reviewed the\nsuspect error and made a determination regarding its resolution. The two resolutions\nwere then compared to determine if the individuals came to the same conclusion. The\nAdministration Header Schedule (front cover of assessment booklet) has 100 percent\nverification of keyed items. The scanning system incorporated a program called the\ngeneralized batch editing system. This program generates reports of suspect errors. The\nerror correction continues until all errors are corrected. The NCS scanning process\nprovided reasonable assurance that the data entered into the database was complete and\naccurate.\n\nNCS uses a scoring tool called Image Capture Environment (ICE). The ICE includes\nsignificant controls to ensure accurate scoring. To ensure that scanned images are\nmatched with the appropriate scoring prompts, the system loads the scanned images into\nthe database with control information, such as the type of booklet, question number, and\nbook number. When the image is captured, it is tied to the control data. The scanned\nimages, which are the responses from the assessment booklets, are merged with another\nfile that contains the question and prompts used for scoring. The scanned image is called\nthe "clip" and the merged file is called the "overlay." The clip is placed in the center and\nthe overlay surrounds the clip. The scoring prompts would consist of the defined scoring\nsystem and the labels "B," "X," "IL," "?," and "OT." The defined scoring system could\nconsist of correct or incorrect or some type of number point value, such as 1 for incorrect\nand 4 for correct. The labels are the special coding categories for unscorable responses.\n\nTo ensure that the appropriate overlay is matched with the right clip, NCS used control\ninformation contained in its databases. The ICE used four databases: scoring, application\nrepository, workflow, and operational. The scoring database contained statistical\ninformation, the application repository database contained information about the overlay,\nthe workflow database contained the scoring information, and the operational database\ncontained information about the other databases. An NCS official identified the question\nto score and the clips for that question were loaded into the workflow database along with\ncontrol information that identifies the individual clips. The ICE corresponds with the\napplication repository (definition database) to determine the correct overlay to merge\nwith the clip. The application repository defines the scoring and the labels to be used for\nthe specific question. This includes the scoring rubric, which is used to set up the scorer\nshell dialog box. The ICE software tool gets five clips from the workflow database,\n attaches the overlays from the application repository, and sends the information to a\n scorer. The clips and overlays are loaded into their respective databases based on a\n scoring schedule.\n\nThe ICE tracks scoring, limits access to scoring batches, and runs edit checks. During\nscoring information was saved to a table in the workflow database. This information\nincluded the score, scorer\'s identification number, and a time stamp.\n\n\n\nJune 2003                      Review of Management Controls Over                        Page 4 of7\n                  Scoring of the National Assessment of Educational Progress 2000\n\x0cFINAL AUDIT REPORT                     ED/OIGA05-COOIO                              ATTACHMENT 1\n\n\nNCS had controls in place that limited access to make scoring changes, which enhanced\ndata quality. During scoring a question may get scored by one to three different scorers.\nSome questions are scored a second time for interrater reliability and/or by a scoring\ns~ervisor. The scores in the workflow database are referred to as the reported 1st score,\n2 score, and original (supervisor) score. The 1st scorers could change their own scores\nand were allowed-to go back five questions and make changes. A review queue held up\nto five questions. When a new question was added to the review queue, the oldest\nquestion moved out of the queue and the score was updated to the database. Once the\nscore was updated to the database, the scoring supervisor was allowed access to backread\nthe score. The scoring supervisor was allowed access to the scores up to four hours after\nthe completion ofthe scoring batch, when the batch was closed. The database allowed\n                                                  nd\nonly one 1st score to be recorded, zero or one 2 score, and multiple supervisor scores.\nWhile the NCS database maintains the multiple supervisor scores, only the final\nsupervisor score is included in the data files sent to ETS.\n\nWhile NCS used interrater reliability to ensure scoring quality, it also had steps to ensure\nthe quality and timeliness of the data. A table in the workflow database tracked the\ninterrater reliability for questions so that the scoring supervisor could calculate the\ninterrater reliability percentage. The interrater reliability table was constantly updated\nduring the scoring process so that the scoring supervisor could calculate the interrater\nreliability any time. The calculation only included the 1st and 2ndscores. The five\nquestions in the review queue were not included in the interrater reliability calculation.\nThe interrater reliability percentage was calculated based on individual questions. An\nindividual question may require more than one scoring batch.\n\nNCS had data quality assurance steps for the batches to ensure that the data was accurate\nand complete. A batch identification number identified the scoring batch. The scoring\nbatch remained in the database until the batch was completely scored. The completion of\nscoring was signified by a prescribed number of scores being entered. The scoring\ninformation was extracted from the database, and quality assurance edit checks were\nperformed to ensure data was accurate and complete. The NCS data quality assurance\nsteps for scoring ensured that the data was accurate and complete.\n\nThe NCS process for data delivery to ETS included steps to ensure ETS had all the\nneeded data and that ETS knew which score to use for analysis and reporting. These\nsteps included merging the data from the scoring batches into a file and determining\nwhich score was the official score. The scanned images were not included in the file.\nThe optically read bubbles for the multiple-choice questions were combined with the\nscore given by the scorer for constructed response questions in the file. NCS had\npreviously scanned the assessment booklet multiple-choice bubble answers and converted\nthem into number values for ETS to use for scoring. The files sent to ETS were separated\nby national, state, grade level, and subject. Separate files were created for scorer\nidentification, question name, date question scored, and assessment booklet identification\nnumber. The ICE tracked all scorers for each question and identified the question scored.\nThe scorer identification file contained the official score. When NCS created the scorer\nidentification file, it determined which scores became the official scores. To make this\n\n\nJune 2003                      Review of Management Controls Over                        Page 5 of7\n                  Scoring of the National Assessment of Educational Progress 2000\n\x0cFINAL AUDIT REPORT                      ED/OIGA05-COOlO                               ATTACHMENT 1\n\n\ndetermination, NCS examined the reported 1st score and supervisor score. If the reported\n1st score and the supervisor score were different, NCS made the supervisor score the\nofficial score. Ifthere was no supervisor score, the reported 1st score became the official\nscore. The scorer identification file sent to ETS included the reported 1st, 2nd, and\noriginal scores. Determining the official score ensured that ETS would always know\nwhich score to use in its analysis and reporting. The NCS data quality assurance steps for\ndata delivery ensured ETS received all the needed data and that ETS knew which score to\nuse for analysis and reporting.\n\nAs part of our data completeness work, we tested management controls over scoring by\nexamining the NCS mainframe final data for anomalies, identifying the scorers for each\nsubject, and judgmentally selected scorers to interview. We used the information below\nas part of our examination of the database for anomalies and reconciliation to various\ndocuments provided by ETS and NCS. Our review of the NCS database disclosed that\nthere were:\n\n SESSION       CONSTRUCTED MULTIPLE\xc2\xad CONSTRUCTED CONSTRUCTED\n                  RESPONSE   CHOICE     RESPONSE   RESPONSE\n                 QUESTIONS QUESTIONS     SCORERS    SCORING\n                                                      DATES\nGrade 4        46          35        46          March 31, 2000\xc2\xad\nReading                                          A~ri119, 2000\n\n\nGrade 4     60                         86                 211                    March 11, 2000 \xc2\xad\nMathematics                                                                      May 28,2000\nGrade 8     62                         98\nMathematics\nGrade 12    64                          100\nMathematics\n\nGrade 4        80                      70                 273                    March 13, 2000\xc2\xad\nScience                                                                          June 8, 2000\nGrade 8        110                      95\nScience\nGrade 12       105                      91\nScience\n\n\nAnalysis and Reporting\n\nThe ETS system of formal procedural and statistical checks was designed to ensure that\nthe data analysis followed the right steps in the right order and that data abnormalities\nwere caught and resolved. These checks included item analysis, scorer reliability\nprograms, item calibration, item plots, condition variable processing, and scale score\nestimation. ETS used a variety of automated programs to assist in performing these\nchecks. The plausibility checks are a system of comparing data to expectations, historical\n\nJune 2003                        Review of Management Controls Over                        Page 6 of7\n                    Scoring of the National Assessment of Educational Progress 2000\n\x0cFINAL AUDIT REPORT                     ED/OIGA05-COOIO                             ATTACHMENT 1\n\n\nprecedent, and data obtained through other analysis methods to make sure the results\nmake sense. When NAEP reports are written, statistics, figures, web tools, and other\nmaterials are subjected to quality assurance.\n\n\n\n\nJune 2003                     Review of Management Controls Over                       Page 7 of7\n                 Scoring of the National Assessment of Educational Progress 2000\n\x0cFINAL AUDIT REPORT                                   ED/OIGA05-COO 10                                  ATTACHMENT 2\n\n\n                              UNITED STATES DEPARTMENT OF EDUCATION \n\n                                  INS1TIU1E OF EDUCATION SCIENCES \n\n\n\n\n\n                                                                                                     June 2, 2003\n\n       MEMORANDUM\n\n       To.:        Richard J. Dowd\n                   Regional Inspector General for Audit\n                   Chicago,IL\n\n       From:      r-\n                   Grover J. Whitehurst~\n                   Director, Institute of Education Sciences\n\n       Subject:    Response to Draft Audit Report\n                   Review of Management Controls Over Scoring of the National Assessment of\n                   Educational Progress (NAEP) 2000\n                   Control Number ED-OIG/A05-COOlO\n\n       Thank you for the opportunity to respond to your draft report. We are pleased that the\n       study resulted in a determination that management controls over scoring of the NAEP\n       2000 assessment were adequate.\n\n       We concur with the findings and recommendations and have taken steps to ensure that\n       contractor monitoring is improved and that the non-material weaknesses have been\n       addressed. Attached is a response from the National Center for Education Statistics\n       (NeES) that documents the changes and improvements that have been implemented to\n       address OIG findings and recommendations.\n\n\n\n       Attachment\n\n        cc: Richard Rasa, Director, Advisory & Assistance for State & Local Programs, 010\n            Valerie Plisko, Director, NeES\n                                  ....   ,\n                                 , .. ~.:\'   \\   .\n\n                                                                         \xe2\x80\xa2 \'., i\'t:", :\'. \';~~"";;\n                                                                                                         \'.\' ::i.\n                                                                                    ,   ...~.\n\n\n\n\n   June 2003                           Review of Management Controls Over                                           Page 1 of3\n                         ScorIng of the National Assessment of Educational Progress 2000\n\n\n                               555 NEW JERSEY AVE., NW, WASHINGTON, DC. 20208\n\x0c                                                                                                      I.\n\n\n\n\n   FINAL AUDIT REPORT                            ED/OIGA05-COO 10 \t                           ATTACHMENT 2\n\n\n\n\n                      NCES Response to OIG Draft Audit Report ED-OIG/A05-COOIO\n\n\n         The draft DIG Audit Report, Review ofManagement Controls Over Scoring ofthe\n         National Assessment ofEducational Progress 2000, is a thorough review of the adequacy\n          of the quality of management controls over scoring of the 2000 National Assessment of\n          Educational Progress. The audit comments on two nonmaterial weaknesses in the\n          monitoring of the management controls over scoring ofthe 2000 NAEP in Mathematics\n          and Science. Both are being addressed through improvements NCES has made to\n          directing and monitoring contractor work and are described in this memo.\n\n          1) The first weakness is that \'NCS [now NCS Pearson, the NAEP contractor for scoring]\n          did not use andlor document mathematics qualifying sets for training on extended\n          constructed response questions as required in the NAEP 2000 Technical Application.\' To\n          remedy this weakness for the 2003 NAEP, NCES has ensured that NCS Pearson and\n          Educational Testing S.ervice (ETS), the NAEP contractor for design, analysis, and\n          reporting, are working together to reorganize the training sets for all the extended\n          constructed response items. These activities are being fully documented in contractor\n          monthly reports to NCES. NCS Pearson is keeping complete records regarding\n          qualifying sets of items for scoring.\n\n          In more detail, between 10 and 20 practice papers were pulled from the Practice Set and\n          placed in Qualification Sets of 10. Ifthere were less than 10 papers remaining in the\n          Practice Set, the Practice Set was supplemented with responses from 2003. The new\n          Training Sets for extended constructed response items were implemented for the scoring\n          of the 2003 NAEP mathematics assessment. Training sets for new extended constructed\n          response items are to contain:\n\n                    Anchor Papers \t      Approximately 10 papers that definitively show the score\n                                         points. These papers have scores printed on them.\n                    Practice Papers \t    Usually 2 sets of 10 papers each that show more of the\n                                         \'gray\' areas. There are no scores printed on these papers.\n                                         Scorers have an opportunity to practice scoring and also\n                                         ask more questions to flesh out their understanding of the\n                                         rubric.\n                    Qualification Papers Usually 2 sets of 10 papers each. There are no scores\n                                         printed on these papers. The scorers must attain an 80%\n                                         correct score to begin scoring the item.\n\n           All of these papers are part of the training set; for each paper, the trainer explains to the\n           scorers why a response was given a specific score. As recommended in further quality\n           control studies, some ofthe training sets will be further expanded.\n\n           2) The audit report also disclosed that some scorers did not meet scorer qualification\n           requirements in 2000. Under the current scoring contract, NCS Pearson is allowed to\n           substitute scoring experience for some academic qualifications. This is due to the\n           difficulty of hiring enough scorers with the previously required academic credentials, and\n\n\n                                         Review of Management Controls Over                                Page 2 of3\n     June 2003\n                           Scoring of the National Assessment of Educational Progress 2000\n\n_ _---r_ _ _ _ _ _- , . . _ - , - - - - - - - - - - - - - -..                                                           ..J\n\x0cFINAL AUDIT REPORT                            ED/OIGA05-COO 10                             ATTACHMENT 2\n\n\n\n      the determination by NeES that prior successful experience in scoring and effective\n      training are as critical a prerequisite for consistent, high-quality scoring as possession of\n      an advanced degree or even classroom teaching experience in a specific content area.\n      NeES is monitoring the qualifications of scorers through the contract process. In\n      addition, NeBS has added a new level of external evaluation ofNAEP scoring quality to\n      the program through the award of a separate Quality Assurance contract to the Human\n      Resources Research Organization (HumRRO).\n\n      We appreciate the opportunity to respond to the findings of the auditors and document the\n      changes and improvements that have been implemented to address these findings.\n\n\n\n\n   June 2003                           Review of Management Controls Over                             Page 3 of3\n                         Scoring of the National Assessment of Educational Progress 2000\n\x0c'